`
Load the tweets and check if they are loaded correctly. We also check the summary for a first interpretation. The summary(tweets) output reveals the following:
# Set working directory
# getwd()
# setwd("./data/")
# Load data
load("./Tweets_all.rda")
# Check that tweets are loaded
head(tweets)
## # A tibble: 6 × 14
## created_at id id_str full_text in_reply_to_screen_n…¹
## <dttm> <dbl> <chr> <chr> <chr>
## 1 2023-01-20 17:17:32 1.62e18 1616469988369469… "Im MSc … <NA>
## 2 2023-01-13 07:52:01 1.61e18 1613790954737074… "Was bew… <NA>
## 3 2023-01-12 19:30:01 1.61e18 1613604227141537… "Was uns… <NA>
## 4 2023-01-12 08:23:00 1.61e18 1613436367169634… "Eine di… <NA>
## 5 2023-01-11 14:00:05 1.61e18 1613158809081450… "Wir gra… <NA>
## 6 2023-01-10 17:06:11 1.61e18 1612843252083834… "Unsere … <NA>
## # ℹ abbreviated name: ¹in_reply_to_screen_name
## # ℹ 9 more variables: retweet_count <int>, favorite_count <int>, lang <chr>,
## # university <chr>, tweet_date <dttm>, tweet_minute <dttm>,
## # tweet_hour <dttm>, tweet_month <date>, timeofday_hour <chr>
summary(tweets)
## created_at id id_str
## Min. :2009-09-29 14:29:47.0 Min. :4.469e+09 Length:19575
## 1st Qu.:2015-01-28 15:07:41.5 1st Qu.:5.604e+17 Class :character
## Median :2018-04-13 13:26:56.0 Median :9.848e+17 Mode :character
## Mean :2017-12-09 15:26:50.7 Mean :9.400e+17
## 3rd Qu.:2020-10-20 10:34:50.0 3rd Qu.:1.318e+18
## Max. :2023-01-26 14:49:31.0 Max. :1.619e+18
## full_text in_reply_to_screen_name retweet_count favorite_count
## Length:19575 Length:19575 Min. : 0.000 Min. : 0.00
## Class :character Class :character 1st Qu.: 0.000 1st Qu.: 0.00
## Mode :character Mode :character Median : 1.000 Median : 0.00
## Mean : 1.289 Mean : 1.37
## 3rd Qu.: 2.000 3rd Qu.: 2.00
## Max. :267.000 Max. :188.00
## lang university tweet_date
## Length:19575 Length:19575 Min. :2009-09-29 00:00:00.00
## Class :character Class :character 1st Qu.:2015-01-28 00:00:00.00
## Mode :character Mode :character Median :2018-04-13 00:00:00.00
## Mean :2017-12-09 02:25:45.00
## 3rd Qu.:2020-10-20 00:00:00.00
## Max. :2023-01-26 00:00:00.00
## tweet_minute tweet_hour
## Min. :2009-09-29 14:29:00.00 Min. :2009-09-29 14:00:00.00
## 1st Qu.:2015-01-28 15:07:00.00 1st Qu.:2015-01-28 14:30:00.00
## Median :2018-04-13 13:26:00.00 Median :2018-04-13 13:00:00.00
## Mean :2017-12-09 15:26:24.68 Mean :2017-12-09 14:59:43.81
## 3rd Qu.:2020-10-20 10:34:30.00 3rd Qu.:2020-10-20 10:00:00.00
## Max. :2023-01-26 14:49:00.00 Max. :2023-01-26 14:00:00.00
## tweet_month timeofday_hour
## Min. :2009-09-01 Length:19575
## 1st Qu.:2015-01-01 Class :character
## Median :2018-04-01 Mode :character
## Mean :2017-11-24
## 3rd Qu.:2020-10-01
## Max. :2023-01-01
Start preprocessing the tweets, to calculate the intervalls some additional properties are needed. The preprocessing steps transform raw tweet data into a structured format suitable for analysis. This includes:
# Preprocessing Step: Convert date and time to POSIXct and format according to date, year and university. Detect language and extract emojis. The days are sorted from the system locale starting from monday
tweets <- tweets %>%
mutate(
created_at = as.POSIXct(created_at, format = "%Y-%m-%d %H:%M:%S"),
date = as.Date(created_at),
day = lubridate::wday(created_at,
label = TRUE, abbr = FALSE,
week_start = getOption("lubridate.week.start", 1),
locale = Sys.getlocale("LC_TIME")
),
year = year(created_at),
month = floor_date(created_at, "month"),
university = as.character(university),
full_text_emojis = replace_emoji(full_text, emoji_dt = lexicon::hash_emojis)
)
# Remove Emoji Tags helper funciton
# replace emoji places the emojis in the text as tags and their name, we remove them here
remove_emoji_tags <- function(text) {
str_remove_all(text, "<[a-z0-9]{2}>")
}
# Remove Emoji Tags
tweets$full_text_emojis <- sapply(tweets$full_text_emojis, remove_emoji_tags)
# Store emojis in a sep arate column to analyze later
tweets$emoji_unicode <- tweets %>%
emoji_extract_nest(full_text) %>%
select(.emoji_unicode)
Each university has a distinct peak hour for tweeting, often aligning with typical working hours (9 AM - 5 PM). This suggests a strategic approach to reach their target audience when they are most likely online. The most active hours for each university are as follows:
These times typically align with standard working hours, indicating a strategic approach to reach their audience during times they are most likely to be online. It appears that a typical worker is more productive and active on Twitter in the morning, with motivation waning around midday and continuing to decline until the end of the workday.
There isn’t a consistent “most active day” across universities. Some favor weekdays, while others show higher activity on weekends. This could reflect differences in their target audience or the nature of their content.
The pattern also suggests that tweet activity tends to be higher earlier in the week, with motivation and tweet frequency potentially falling as the week progresses.
# Count each tweet by university and hour of the day
tweet_counts_by_hour_of_day <- tweets %>%
group_by(university, timeofday_hour) %>%
count() %>%
arrange(university, timeofday_hour)
# Plot the number of tweets by university and hour of the day
ggplot(
tweet_counts_by_hour_of_day,
aes(
x = timeofday_hour, y = n,
color = university, group = university
)
) +
geom_line() +
facet_wrap(~university) +
labs(
title = "Number of tweets by university and hour",
x = "Hour of day",
y = "Number of tweets"
)
# Show most active hours for each university
hours_with_most_tweets_by_uni <- tweet_counts_by_hour_of_day %>%
group_by(university, timeofday_hour) %>%
summarize(total_tweets = sum(n)) %>%
group_by(university) %>%
slice_max(n = 1, order_by = total_tweets)
print(hours_with_most_tweets_by_uni)
## # A tibble: 8 × 3
## # Groups: university [8]
## university timeofday_hour total_tweets
## <chr> <chr> <int>
## 1 FHNW 09 344
## 2 FH_Graubuenden 11 493
## 3 ZHAW 17 580
## 4 bfh 08 497
## 5 hes_so 10 315
## 6 hslu 09 380
## 7 ost_fh 08 44
## 8 supsi_ch 11 330
# Show most active hour overall
hour_with_most_tweets <- tweet_counts_by_hour_of_day %>%
group_by(timeofday_hour) %>%
summarize(total_tweets = sum(n)) %>%
arrange(desc(total_tweets)) %>%
slice_max(n = 1, order_by = total_tweets)
print(hour_with_most_tweets)
## # A tibble: 1 × 2
## timeofday_hour total_tweets
## <chr> <int>
## 1 11 2356
# Count each tweet by university and weekday
tweet_counts_by_week_day <- tweets %>%
group_by(university, day) %>%
count() %>%
arrange(university, day)
# Plot the number of tweets by university and day of the week
ggplot(
tweet_counts_by_week_day,
aes(
x = day, y = n,
color = university,
group = university
)
) +
geom_line() +
facet_wrap(~university) +
labs(
title = "Number of tweets by university and day of the week",
x = "Day of the week",
y = "Number of tweets"
)
# Show most active days for each university
days_with_most_tweets_by_uni <- tweet_counts_by_week_day %>%
group_by(university, day) %>%
summarize(total_tweets = sum(n)) %>%
group_by(university) %>%
slice_max(n = 1, order_by = total_tweets)
print(days_with_most_tweets_by_uni)
## # A tibble: 8 × 3
## # Groups: university [8]
## university day total_tweets
## <chr> <ord> <int>
## 1 FHNW Tuesday 575
## 2 FH_Graubuenden Tuesday 751
## 3 ZHAW Wednesday 636
## 4 bfh Tuesday 651
## 5 hes_so Tuesday 415
## 6 hslu Thursday 603
## 7 ost_fh Friday 65
## 8 supsi_ch Friday 461
# Combine the most active hours and days for each university to show heatmap
heatmap_data <- tweets %>%
group_by(timeofday_hour, day) %>%
count() %>%
ungroup()
# Plot heatmap and we can see clearly that the most tweets are posted during the working hours from monday to friday
ggplot(heatmap_data, aes(x = timeofday_hour, y = day, fill = n)) +
geom_tile(color = "white") +
scale_fill_gradient(low = "lightblue", high = "darkblue") +
labs(
title = "Heatmap of Tweet Activity by Hour and Day",
x = "Hour of Day",
y = "Day of the Week",
fill = "Number of Tweets"
) +
theme_minimal()
While universities have peak hours and days, the intervals between tweets vary significantly, indicating a more reactive strategy rather than a rigid release schedule. This variability suggests that the universities might be responding to real-time events or trends rather than sticking to a strict posting schedule.
To further understand the dispersion of tweets, we analyzed the time intervals between tweets using measures like the mean interval, standard deviation, and entropy. Understanding the variability helps in assessing how consistent or sporadic the posting behavior is. High variability indicates that the university does not follow a strict schedule and posts at irregular intervals, which could be a sign of a more reactive approach to social media. Higher entropy suggests less predictability in tweet timing, indicating a more dynamic and responsive posting strategy. This is crucial for understanding how universities might be reacting to real-time events or trends rather than following a predetermined schedule.Here are the results for selected universities:
tweets <- tweets %>%
arrange(university, created_at) %>%
group_by(university) %>%
mutate(time_interval = as.numeric(
difftime(created_at, lag(created_at), units = "mins")
))
# Plotting the time intervals
universities <- unique(tweets$university)
for (uni in universities) {
uni_filtered_data <- tweets %>%
filter(university == uni)
# Plot the distribution of time intervals
print(ggplot(uni_filtered_data, aes(x = time_interval)) +
geom_histogram(fill = "lightblue", bins = 30) +
facet_wrap(~year) +
labs(
title = paste0("Distribution of time intervals between tweets - ", uni),
x = "Time interval (minutes)",
y = "Tweet count"
))
# Plot posting day for each year for university because a intervall could be "short" but when a university only posts twice a year it seems active but it is actually not
tweet_counts <- uni_filtered_data %>%
group_by(tweet_month) %>%
summarise(tweet_count = n())
print("Tweet count for each month")
print(ggplot(tweet_counts, aes(x = tweet_month, y = tweet_count)) +
geom_line(color = "#F9C301") +
geom_point(color = "#37556E", size = 3) +
scale_x_date(date_labels = "%b %Y", date_breaks = "1 month") +
labs(
title = "Monthly Tweet Activity",
x = "Month",
y = "Tweet Count"
) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)))
# Calculate dispersion measures for time intervals
dispersion_measures <- uni_filtered_data %>%
group_by(year) %>%
summarise(
mean_interval = mean(time_interval, na.rm = TRUE),
sd_interval = sd(time_interval, na.rm = TRUE),
entropy_interval = entropy::entropy(discretize(time_interval,
numBins = 30, r = range(time_interval, na.rm = TRUE)
))
)
print(paste("Dispersion measures for", uni))
print(dispersion_measures)
# Line plot for mean interval and standard deviation over the years
print(ggplot(dispersion_measures, aes(x = year)) +
geom_line(aes(y = mean_interval, color = "Mean Interval")) +
geom_point(aes(y = mean_interval, color = "Mean Interval")) +
geom_line(aes(y = sd_interval, color = "Standard Deviation")) +
geom_point(aes(y = sd_interval, color = "Standard Deviation")) +
scale_y_continuous(sec.axis = dup_axis()) +
labs(
title = "Mean Interval and Standard Deviation of Tweet Intervals Over Years",
x = "Year",
y = "Minutes",
color = "Measure"
) +
theme_minimal())
print(ggplot(dispersion_measures, aes(x = year)) +
geom_line(aes(y = entropy_interval, color = "Entropy Interval")) +
geom_point(aes(y = entropy_interval, color = "Entropy Interval")) +
scale_y_continuous(sec.axis = dup_axis()) +
labs(
title = "Entropy of Tweet Intervals Over Years",
x = "Year",
y = "Entropy"
) +
theme_minimal())
}
## [1] "Tweet count for each month"
## [1] "Dispersion measures for FHNW"
## # A tibble: 13 × 4
## year mean_interval sd_interval entropy_interval
## <dbl> <dbl> <dbl> <dbl>
## 1 2011 6994. 4215. 0.693
## 2 2012 2120. 2893. 2.01
## 3 2013 1291. 2668. 1.29
## 4 2014 3238. 4378. 2.17
## 5 2015 4695. 7894. 1.81
## 6 2016 5382. 7382. 2.21
## 7 2017 3696. 6566. 1.61
## 8 2018 1514. 3440. 0.748
## 9 2019 1657. 2226. 1.53
## 10 2020 1744. 1670. 2.36
## 11 2021 2022. 2269. 2.36
## 12 2022 1830. 1968. 2.35
## 13 2023 2160. 2580. 2.15
## [1] "Tweet count for each month"
## [1] "Dispersion measures for FH_Graubuenden"
## # A tibble: 15 × 4
## year mean_interval sd_interval entropy_interval
## <dbl> <dbl> <dbl> <dbl>
## 1 2009 2313. 3692. 2.00
## 2 2010 33181. 63719. 1.39
## 3 2011 6164. 8193. 1.97
## 4 2012 2501. 4219. 1.64
## 5 2013 3197. 5529. 1.76
## 6 2014 1254. 3569. 0.578
## 7 2015 1088. 1521. 1.98
## 8 2016 1448. 1726. 1.89
## 9 2017 2227. 2462. 2.14
## 10 2018 2384. 3034. 1.88
## 11 2019 3019. 3146. 2.38
## 12 2020 3011. 2764. 2.51
## 13 2021 2913. 2723. 2.78
## 14 2022 3558. 4138. 2.45
## 15 2023 13889. 7145. 1.10
## [1] "Tweet count for each month"
## [1] "Dispersion measures for ZHAW"
## # A tibble: 12 × 4
## year mean_interval sd_interval entropy_interval
## <dbl> <dbl> <dbl> <dbl>
## 1 2012 1178. 1486. 2.09
## 2 2013 1291. 1963. 1.35
## 3 2014 1098. 1884. 1.29
## 4 2015 2032. 2341. 2.00
## 5 2016 1570. 1847. 2.00
## 6 2017 1524. 1580. 2.21
## 7 2018 1490. 1552. 2.42
## 8 2019 1448. 1570. 2.25
## 9 2020 2527. 2917. 2.38
## 10 2021 3160. 2988. 2.43
## 11 2022 5463. 6825. 2.06
## 12 2023 35371. 50020. 0.693
## [1] "Tweet count for each month"
## [1] "Dispersion measures for bfh"
## # A tibble: 12 × 4
## year mean_interval sd_interval entropy_interval
## <dbl> <dbl> <dbl> <dbl>
## 1 2012 1483. 2261. 1.68
## 2 2013 1265. 1888. 1.57
## 3 2014 1055. 1526. 1.78
## 4 2015 2209. 2797. 2.03
## 5 2016 3301. 4155. 2.19
## 6 2017 3149. 3790. 2.35
## 7 2018 2510. 2948. 2.10
## 8 2019 2059. 2686. 1.74
## 9 2020 1968. 2164. 1.97
## 10 2021 1484. 1909. 1.52
## 11 2022 1856. 2106. 2.31
## 12 2023 2800. 3313. 2.10
## [1] "Tweet count for each month"
## [1] "Dispersion measures for hes_so"
## # A tibble: 14 × 4
## year mean_interval sd_interval entropy_interval
## <dbl> <dbl> <dbl> <dbl>
## 1 2010 9353. 10637. 2.54
## 2 2011 4852. 6117. 2.09
## 3 2012 4983. 6238. 2.38
## 4 2013 7294. 17630. 1.22
## 5 2014 6485. 8252. 2.27
## 6 2015 4772. 10362. 1.34
## 7 2016 5423. 9275. 1.78
## 8 2017 8087. 14932. 1.79
## 9 2018 3107. 6086. 1.65
## 10 2019 2622. 7755. 0.699
## 11 2020 1521. 2343. 1.66
## 12 2021 2956. 4038. 1.97
## 13 2022 2427. 2218. 2.74
## 14 2023 1860. 1787. 2.11
## [1] "Tweet count for each month"
## [1] "Dispersion measures for hslu"
## # A tibble: 8 × 4
## year mean_interval sd_interval entropy_interval
## <dbl> <dbl> <dbl> <dbl>
## 1 2016 2192. 3286. 1.79
## 2 2017 1702. 2338. 1.91
## 3 2018 1261. 1765. 1.76
## 4 2019 1960. 2304. 2.20
## 5 2020 1095. 1423. 2.08
## 6 2021 569. 598. 2.41
## 7 2022 844. 905. 2.16
## 8 2023 1909. 1524. 2.58
## [1] "Tweet count for each month"
## [1] "Dispersion measures for ost_fh"
## # A tibble: 3 × 4
## year mean_interval sd_interval entropy_interval
## <dbl> <dbl> <dbl> <dbl>
## 1 2020 1478. 1653. 2.20
## 2 2021 5272. 7736. 2.06
## 3 2022 8470. 15109. 1.77
## [1] "Tweet count for each month"
## [1] "Dispersion measures for supsi_ch"
## # A tibble: 11 × 4
## year mean_interval sd_interval entropy_interval
## <dbl> <dbl> <dbl> <dbl>
## 1 2013 2536. 3892. 1.85
## 2 2014 5283. 12951. 1.15
## 3 2015 8346. 12979. 1.99
## 4 2016 2634. 5705. 1.48
## 5 2017 1396. 3394. 0.772
## 6 2018 1648. 3127. 1.45
## 7 2019 2302. 3156. 1.81
## 8 2020 3093. 3234. 2.34
## 9 2021 1769. 2297. 1.49
## 10 2022 2677. 3112. 1.88
## 11 2023 2372. 2840. 1.96
# Descriptive statistics of time intervals
summary(tweets$time_interval)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0 148.2 1128.8 2097.6 2428.3 220707.0 8
The data indicates that Swiss Universities of Applied Sciences primarily tweet during working hours and show distinct patterns in their most active days and hours. Workers tend to be more productive and active on Twitter in the morning, with a noticeable decline in activity around midday and towards the end of the week.
The tweets are filtered based on language, focusing on German, French, Italian, and English. These languages where choosen based on the popularity over all tweet languages. It removes common and extended stopwords, including non-meaningful words like ‘amp’ (which represents ‘&’) and ‘rt’ (commonly found in retweets). The extended stopwords list includes hashtags and URLs related to specific Swiss universities.
Next, the code processes tweets separately for each language. This involves creating tokens from the text, removing unwanted characters, stemming words, and creating n-grams. The processed tokens are then used to create Document-Feature Matrices (DFMs) for each language.
langs <- c("de", "fr", "it", "en")
tweets_filtered <- tweets %>%
filter(lang %in% langs)
# Define extended stopwords (outside loop for efficiency)
# Remove 'amp' as it is not meaningful because its only & symbol
# Remove 'rt' because it is an word e.g 'engagiert'.
extended_stopwords <- c(
"#fhnw", "#bfh", "@htw_chur", "#hslu", "#supsi", "#sups",
"amp", "rt", "fr", "ber", "t.co", "https", "http", "www", "com", "html"
)
# Create separate DFMs for each language
dfm_list <- list()
for (sel_lang in langs) {
# Subset tweets for the current language
tweets_lang <- tweets_filtered %>%
filter(lang == sel_lang)
# Create tokens for the current language
stopwords_lang <- stopwords(sel_lang)
# Create tokens for all tweets:
# - create corpus and tokens because tokensonly works on character, corpus, list, tokens, tokens_xptr objects.
# - create tokens and remove: URLS, Punctuation, Numbers, Symbols, Separators
# - transform to lowercase
# - Stem all words
# - Create n-grams of any length (not includinf bigrams and trigrams but they are shown later)
# - It is important to remove the stopwords after stemming the words because we remove the endings from some stem words
tokens_lang <- tweets_lang %>%
corpus(text_field = "full_text_emojis") %>%
tokens(
remove_punct = TRUE, remove_symbols = TRUE, remove_numbers = TRUE,
remove_url = TRUE, remove_separators = TRUE
) %>%
tokens_tolower() %>%
tokens_wordstem(lang = sel_lang) %>%
tokens_ngrams(n = 1) %>%
tokens_select(
pattern =
c(stopwords_lang, extended_stopwords), selection = "remove"
)
# Create DFM for the current language
dfm_list[[sel_lang]] <- dfm(tokens_lang)
}
Tweets were analyzed across four languages: German, French, Italian, and English. Each university tends to tweet predominantly in one or more languages, reflecting the linguistic diversity of Switzerland.
It’s important to note that some words like “right” 👉 and “arrow” ➡️ are actually names of parsed emojis and not written words in the tweets.
Word clouds for each language visually show the most common words, emphasizing their relative frequencies. The analysis revealed that universities tweet in multiple languages, reflecting the linguistic diversity of their audience. But still we can order the universities by language. For example BFH and FHNW are tweeting in german, HES-SO in french, SUPSI in italian and HSLU in english.
# Word Frequencies & Visualization
words_freqs_en <- sort(colSums(dfm_list$en), decreasing = TRUE)
head(words_freqs_en, 20)
## student @hslu new @zhaw project univers
## 117 90 86 73 69 69
## day thank switzerland swiss scienc innov
## 68 67 67 65 64 61
## now studi great today join @fhnw
## 54 54 53 52 48 47
## research @supsi_ch
## 47 47
wordcloud(
words = names(words_freqs_en),
freq = words_freqs_en,
max.words = 200,
random.order = FALSE,
colors = brewer.pal(8, "Dark2")
)
words_freqs_de <- sort(colSums(dfm_list$de), decreasing = TRUE)
head(words_freqs_de, 20)
## neu mehr schweiz werd all studier heut hochschul
## 1680 1133 988 781 731 729 696 621
## statt bfh jahr bern digital thema findet knnen
## 607 598 539 533 525 524 520 517
## projekt studi welch arbeit
## 488 484 464 444
wordcloud(
words = names(words_freqs_de),
freq = words_freqs_de,
max.words = 200,
random.order = FALSE,
colors = brewer.pal(8, "Dark2")
)
words_freqs_it <- sort(colSums(dfm_list$it), decreasing = TRUE)
head(words_freqs_it, 20)
## nuov sups progett #supsinews info student
## 214 208 176 151 147 146
## present iscrizion cors #supsievent ricerc formazion
## 145 144 142 139 137 135
## scopr inform diplom bachelor apert tutt
## 129 120 119 113 111 109
## master dipart
## 107 105
wordcloud(
words = names(words_freqs_it),
freq = words_freqs_it,
max.words = 200,
random.order = FALSE,
colors = brewer.pal(8, "Dark2")
)
# It seems that there are some english words but I think this are emojis
words_freqs_fr <- sort(colSums(dfm_list$fr), decreasing = TRUE)
head(words_freqs_fr, 20)
## hes-so right arrow projet dan a tudi haut
## 515 433 324 251 249 234 200 182
## col @hes_so dcouvr @hessoval book #hes_so recherch open
## 155 150 130 129 123 119 117 117
## mast suiss plus nouveau
## 111 110 105 99
wordcloud(
words = names(words_freqs_fr),
freq = words_freqs_fr,
max.words = 200,
random.order = FALSE,
colors = brewer.pal(8, "Dark2")
)
# University-specific Analysis
for (uni in unique(tweets$university)) {
# Subset tweets for the current language
uni_tweets <- tweets %>%
filter(university == uni)
tokens_lang <- uni_tweets %>%
corpus(text_field = "full_text_emojis") %>%
tokens(
remove_punct = TRUE, remove_symbols = TRUE, remove_numbers = TRUE,
remove_url = TRUE, remove_separators = TRUE
) %>%
tokens_tolower() %>%
tokens_wordstem() %>%
tokens_ngrams(n = 1) %>%
tokens_select(
pattern =
c(
stopwords("en"), stopwords("de"),
stopwords("fr"), stopwords("it"), extended_stopwords
), selection = "remove"
)
# Create Data Frame Matrix for uni with all languages
uni_dfm <- dfm(tokens_lang)
# Word Frequencies
uni_word_freqs <- sort(colSums(uni_dfm), decreasing = TRUE)
# print most common words: the emoji right are used often
print(paste("Most common words for", uni, ":"))
print(head(uni_word_freqs, 20))
wordcloud(
words = names(uni_word_freqs),
freq = uni_word_freqs,
max.words = 200,
random.order = FALSE,
colors = brewer.pal(8, "Dark2")
)
}
## [1] "Most common words for FHNW :"
## @fhnwbusi fhnw @hsafhnw mehr hochschul
## 380 320 261 233 214
## @fhnwtechnik @fhnw neue campus heut
## 207 205 160 149 149
## @fhnwpsychologi studierend neuen projekt schweiz
## 138 114 108 101 101
## knnen basel entwickelt brugg-windisch prof
## 97 95 85 83 82
## [1] "Most common words for FH_Graubuenden :"
## chur #htwchur statt htw findet
## 367 360 307 292 276
## mehr blogbeitrag #fhgr #infoanlass graubnden
## 230 203 182 179 177
## infoanlass #chur neuen heut #studium
## 157 128 128 127 123
## manag @suedostschweiz neue fh tourismus
## 111 111 103 98 96
## [1] "Most common words for ZHAW :"
## zhaw @zhaw @engineeringzhaw #zhaw
## 281 257 245 187
## cc neue dank @iam_winterthur
## 185 152 147 146
## heut mehr knnen zeigt
## 141 138 137 133
## winterthur schweizer via @sml_zhaw
## 126 125 124 120
## #zhawimpact schweiz studi gibt
## 120 115 115 105
## [1] "Most common words for bfh :"
## bfh bern berner mehr @bfh_hesb
## 606 282 231 213 208
## neue thema fachhochschul @hkb_bfh knnen
## 205 199 166 117 109
## projekt #knoten_maschen biel heut innen
## 109 107 102 102 102
## anmelden schweizer neuen schweiz statt
## 101 100 96 92 89
## [1] "Most common words for hes_so :"
## hes-so right arrow projet dan tudiant
## 532 445 331 249 248 192
## haut @hes_so cole @hessovalai book master
## 178 172 149 133 124 123
## #hes_so open recherch suiss plus magnifi
## 123 123 116 109 103 97
## glass tilt
## 97 97
## [1] "Most common words for hslu :"
## @hslu luzern mehr hochschul depart
## 368 334 258 198 175
## #hsluinformatik heut neue schweizer zeigt
## 173 171 165 146 133
## design knnen studi schweiz gibt
## 132 127 125 118 114
## jahr ab neuen projekt arbeit
## 107 105 105 103 99
## [1] "Most common words for ost_fh :"
## #ostschweizerfachhochschul @ost_fh
## 73 64
## ost @ozg_ost
## 55 28
## mehr neue
## 26 22
## st.gallen rapperswil
## 17 17
## neuen ostschweiz
## 17 15
## #informatik podcast
## 15 15
## detail gibt
## 15 14
## #ost menschen
## 12 12
## thema campus
## 12 12
## @eastdigit #podcast
## 12 12
## [1] "Most common words for supsi_ch :"
## supsi #supsiev #supsinew info studenti formazion
## 231 183 168 148 133 132
## progetto @supsi_ch iscrizioni master nuovo bachelor
## 126 126 117 117 116 115
## right innov dipartimento pi oggi @usi_univers
## 114 109 104 103 102 102
## informazioni manag
## 97 95
### Userreaction Analysis To understand user reactions, the code
calculates a ‘weighted engagement’ metric, combining favorite and
retweet counts. The tweets with the highest engagement are analyzed by
hour and day to identify patterns in user interaction.
The provided bar plots show the average engagement of tweets by hour of the day and by day of the week. Each bar represents the average engagement score for tweets posted during specific hours or on specific days.
# Calculate a 'weighted engagement' metric
tweets <- tweets %>%
mutate(
weighted_engagement = favorite_count * 1 + retweet_count * 2
)
# Identify tweets with the highest weighted engagement
most_engaged_tweets <- tweets %>%
arrange(desc(weighted_engagement)) %>%
head(1000) # Top 1000 for analysis
# Calculate average engagement by hour
engagement_hour <- most_engaged_tweets %>%
group_by(timeofday_hour) %>%
summarise(avg_engagement = mean(weighted_engagement, na.rm = TRUE))
ggplot(engagement_hour, aes(x = timeofday_hour, y = avg_engagement)) +
geom_bar(stat = "identity", fill = "blue") +
labs(
title = "Average Engagement by Hour",
x = "Hour of Day",
y = "Average Engagement"
) +
theme_minimal()
# Calculate average engagement by day
engagement_day <- most_engaged_tweets %>%
group_by(day) %>%
summarise(avg_engagement = mean(weighted_engagement, na.rm = TRUE))
# Plot average engagement by day
ggplot(engagement_day, aes(x = day, y = avg_engagement)) +
geom_bar(stat = "identity", fill = "blue") +
labs(
title = "Average Engagement by Day of the Week",
x = "Day of the Week",
y = "Average Engagement"
) +
theme_minimal()
### Analyse the content of the most liked tweets The most common words
in the most liked tweets include “mehr” (more), “neue” (new), “schweiz”
(Switzerland), “heut” (today), and “hochschul” (university). These words
suggest that tweets focusing on new developments, events happening
today, and general updates about Switzerland and universities tend to
receive more likes.
# Preprocessing content of most liked tweets
tokens_most_engaged <- most_engaged_tweets %>%
corpus(text_field = "full_text_emojis") %>%
tokens(
remove_punct = TRUE, remove_symbols = TRUE, remove_numbers = TRUE,
remove_url = TRUE, remove_separators = TRUE
) %>%
tokens_tolower() %>%
tokens_wordstem(lang = sel_lang) %>%
tokens_ngrams(n = 1) %>%
tokens_select(
pattern =
c(
stopwords("en"), stopwords("de"),
stopwords("fr"), stopwords("it"), extended_stopwords
), selection = "remove"
)
tokens_most_engaged_dfm <- dfm(tokens_most_engaged)
freqs_most_engaged <- sort(colSums(tokens_most_engaged_dfm), decreasing = TRUE)
# print most common words: the emoji right are used often
head(freqs_most_engaged, 20)
## mehr neue schweiz schweizer right
## 81 67 48 47 46
## heut zeigt #hsluinformatik studi zhaw
## 44 41 40 39 39
## hes-so knnen neuen hochschul campus
## 38 38 36 34 33
## innov gibt ab entwickelt bfh
## 31 30 30 30 30
set.seed(123)
wordcloud(
words = names(freqs_most_engaged),
freq = freqs_most_engaged,
max.words = 200,
random.order = FALSE,
colors = brewer.pal(8, "Dark2")
)
Each university shows distinct patterns in the words and emojis used in their tweets. The analysis involved creating word clouds and identifying the most common words and emojis.
Most Common Words:
Most Common Emojis: - FHNW: Top emojis include 👉 (backhand index pointing right), 💛 (yellow heart), and 🖤 (black heart). - FH Graubünden: Frequent emojis are 🎉 (party popper), 😃 (grinning face with big eyes), and 😊 (blush). - ZHAW: Common emojis include 👉 (backhand index pointing right), ⚡ (high voltage), and 😉 (wink). - BFH: Top emojis are 👉 (backhand index pointing right), 🔋 (battery), and 👇 (backhand index pointing down). - HES-SO: Common emojis are 👉 (backhand index pointing right), 🎓 (graduation cap), and ➡ (arrow right). - HSLU: Top emojis include 🎓 (graduation cap), 👨 (man), and 🚀 (rocket). - OST-FH: Frequent emojis are 👉 (backhand index pointing right), ➡ (arrow right), and 🎓 (graduation cap). - SUPSI-CH: Common emojis include 👉 (backhand index pointing right), 🎓 (graduation cap), and 🎉 (party popper).
for (uni in unique(tweets$university)) {
uni_tweets <- tweets %>%
filter(university == uni, lang %in% langs)
tokens_uni <- uni_tweets %>%
corpus(text_field = "full_text_emojis") %>%
tokens(
remove_punct = TRUE, remove_symbols = TRUE, remove_numbers = TRUE,
remove_url = TRUE, remove_separators = TRUE
) %>%
tokens_tolower() %>%
tokens_wordstem() %>%
tokens_ngrams(n = 1) %>%
tokens_select(
pattern =
c(
stopwords("en"), stopwords("de"),
stopwords("fr"), stopwords("it"), extended_stopwords
), selection = "remove"
)
uni_dfm <- dfm(tokens_uni)
freqs_uni <- sort(colSums(uni_dfm), decreasing = TRUE)
# print most common words: the emoji right are used often
print(paste("Most common words for", uni, ":"))
print(head(freqs_uni, 20))
set.seed(123)
wordcloud(
words = names(freqs_uni),
freq = freqs_uni,
max.words = 200,
random.order = FALSE,
colors = brewer.pal(8, "Dark2")
)
# Analyze Top Emojis by University
emoji_count_per_university <- uni_tweets %>%
top_n_emojis(full_text)
print(paste("Top emojis for", uni, ":"))
print(emoji_count_per_university)
emoji_count_per_university %>%
mutate(emoji_name = reorder(emoji_name, n)) %>%
ggplot(aes(n, emoji_name)) +
geom_col() +
labs(x = "Count", y = NULL, title = "Top 20 Emojis Used")
}
## [1] "Most common words for FHNW :"
## @fhnwbusi fhnw @hsafhnw mehr hochschul
## 377 320 257 233 214
## @fhnwtechnik @fhnw neue campus heut
## 205 203 160 149 149
## @fhnwpsychologi studierend neuen projekt schweiz
## 138 114 108 101 101
## knnen basel entwickelt brugg-windisch prof
## 97 94 85 83 82
## [1] "Top emojis for FHNW :"
## # A tibble: 20 × 4
## emoji_name unicode emoji_category n
## <chr> <chr> <chr> <int>
## 1 backhand_index_pointing_right 👉 People & Body 57
## 2 yellow_heart 💛 Smileys & Emotion 34
## 3 black_heart 🖤 Smileys & Emotion 32
## 4 woman 👩 People & Body 28
## 5 clap 👏 People & Body 17
## 6 flag_Switzerland 🇨🇭 Flags 17
## 7 man 👨 People & Body 17
## 8 microscope 🔬 Objects 15
## 9 computer 💻 Objects 14
## 10 robot 🤖 Smileys & Emotion 14
## 11 graduation_cap 🎓 Objects 13
## 12 school 🏫 Travel & Places 13
## 13 face_with_medical_mask 😷 Smileys & Emotion 12
## 14 raised_hands 🙌 People & Body 12
## 15 female_sign ♀️ Symbols 10
## 16 star_struck 🤩 Smileys & Emotion 10
## 17 trophy 🏆 Activities 10
## 18 party_popper 🎉 Activities 9
## 19 woman_scientist 👩🔬 People & Body 9
## 20 sun_with_face 🌞 Travel & Places 8
## [1] "Most common words for FH_Graubuenden :"
## chur #htwchur statt htw findet
## 365 355 305 290 274
## mehr blogbeitrag #fhgr #infoanlass graubnden
## 230 203 182 177 176
## infoanlass #chur neuen heut #studium
## 157 128 128 127 123
## manag @suedostschweiz neue fh tourismus
## 110 108 103 98 96
## [1] "Top emojis for FH_Graubuenden :"
## # A tibble: 20 × 4
## emoji_name unicode emoji_category n
## <chr> <chr> <chr> <int>
## 1 party_popper 🎉 Activities 19
## 2 grinning_face_with_big_eyes 😃 Smileys & Emotion 15
## 3 blush 😊 Smileys & Emotion 9
## 4 smiling_face_with_sunglasses 😎 Smileys & Emotion 8
## 5 bulb 💡 Objects 7
## 6 flexed_biceps 💪 People & Body 7
## 7 +1 👍 People & Body 6
## 8 camera_flash 📸 Objects 6
## 9 four_leaf_clover 🍀 Animals & Nature 6
## 10 grinning_face_with_smiling_eyes 😄 Smileys & Emotion 6
## 11 heart_eyes 😍 Smileys & Emotion 6
## 12 hugs 🤗 Smileys & Emotion 6
## 13 grinning 😀 Smileys & Emotion 5
## 14 computer 💻 Objects 4
## 15 female_sign ♀️ Symbols 4
## 16 graduation_cap 🎓 Objects 4
## 17 robot 🤖 Smileys & Emotion 4
## 18 backhand_index_pointing_down 👇 People & Body 3
## 19 lady_beetle 🐞 Animals & Nature 3
## 20 ocean 🌊 Travel & Places 3
## [1] "Most common words for ZHAW :"
## zhaw @zhaw @engineeringzhaw #zhaw
## 278 254 244 184
## cc neue dank @iam_winterthur
## 183 151 146 145
## heut mehr knnen zeigt
## 141 138 137 133
## winterthur schweizer via @sml_zhaw
## 126 125 124 120
## #zhawimpact schweiz studi gibt
## 120 115 115 105
## [1] "Top emojis for ZHAW :"
## # A tibble: 20 × 4
## emoji_name unicode emoji_category n
## <chr> <chr> <chr> <int>
## 1 backhand_index_pointing_right 👉 People & Body 21
## 2 high_voltage ⚡ Travel & Places 11
## 3 wink 😉 Smileys & Emotion 9
## 4 clap 👏 People & Body 5
## 5 flag_Switzerland 🇨🇭 Flags 5
## 6 rocket 🚀 Travel & Places 5
## 7 +1 👍 People & Body 4
## 8 arrow_right ➡️ Symbols 4
## 9 bug 🐛 Animals & Nature 3
## 10 computer 💻 Objects 3
## 11 flexed_biceps 💪 People & Body 3
## 12 man 👨 People & Body 3
## 13 bangbang ‼️ Symbols 2
## 14 camera_flash 📸 Objects 2
## 15 dark_skin_tone 🏿 Component 2
## 16 exclamation ❗ Symbols 2
## 17 female_sign ♀️ Symbols 2
## 18 four_leaf_clover 🍀 Animals & Nature 2
## 19 green_salad 🥗 Food & Drink 2
## 20 grinning 😀 Smileys & Emotion 2
## [1] "Most common words for bfh :"
## bfh bern berner mehr @bfh_hesb
## 606 280 231 213 207
## neue thema fachhochschul @hkb_bfh knnen
## 205 199 166 117 109
## projekt #knoten_maschen biel heut innen
## 109 107 102 102 102
## anmelden schweizer neuen schweiz statt
## 101 100 96 92 89
## [1] "Top emojis for bfh :"
## # A tibble: 20 × 4
## emoji_name unicode emoji_category n
## <chr> <chr> <chr> <int>
## 1 backhand_index_pointing_right 👉 People & Body 49
## 2 battery 🔋 Objects 16
## 3 backhand_index_pointing_down 👇 People & Body 12
## 4 woman 👩 People & Body 12
## 5 palm_tree 🌴 Animals & Nature 11
## 6 bulb 💡 Objects 10
## 7 computer 💻 Objects 10
## 8 evergreen_tree 🌲 Animals & Nature 10
## 9 graduation_cap 🎓 Objects 10
## 10 party_popper 🎉 Activities 10
## 11 robot 🤖 Smileys & Emotion 10
## 12 rocket 🚀 Travel & Places 10
## 13 clap 👏 People & Body 9
## 14 coconut 🥥 Food & Drink 9
## 15 date 📅 Objects 9
## 16 deciduous_tree 🌳 Animals & Nature 9
## 17 flag_Switzerland 🇨🇭 Flags 9
## 18 automobile 🚗 Travel & Places 8
## 19 clinking_glasses 🥂 Food & Drink 8
## 20 seedling 🌱 Animals & Nature 8
## [1] "Most common words for hes_so :"
## hes-so right arrow projet dan tudiant
## 529 445 331 249 248 192
## haut @hes_so cole @hessovalai book master
## 177 170 149 133 124 123
## #hes_so open recherch suiss plus magnifi
## 123 123 116 109 103 97
## glass tilt
## 97 97
## [1] "Top emojis for hes_so :"
## # A tibble: 20 × 4
## emoji_name unicode emoji_category n
## <chr> <chr> <chr> <int>
## 1 arrow_right ➡️ Symbols 320
## 2 arrow_heading_down ⤵️ Symbols 246
## 3 book 📖 Objects 115
## 4 mag_right 🔎 Objects 97
## 5 mega 📣 Objects 53
## 6 clapper 🎬 Objects 38
## 7 NEW_button 🆕 Symbols 35
## 8 computer 💻 Objects 35
## 9 microscope 🔬 Objects 32
## 10 bulb 💡 Objects 29
## 11 police_car_light 🚨 Travel & Places 28
## 12 backhand_index_pointing_right 👉 People & Body 27
## 13 graduation_cap 🎓 Objects 23
## 14 studio_microphone 🎙️ Objects 23
## 15 clap 👏 People & Body 21
## 16 date 📅 Objects 17
## 17 medal_sports 🏅 Activities 15
## 18 memo 📝 Objects 15
## 19 woman 👩 People & Body 15
## 20 flag_Switzerland 🇨🇭 Flags 14
## [1] "Most common words for hslu :"
## @hslu luzern mehr hochschul depart
## 363 332 258 198 175
## #hsluinformatik heut neue schweizer zeigt
## 172 171 165 146 133
## design knnen studi schweiz gibt
## 132 127 125 118 114
## jahr ab neuen projekt arbeit
## 106 105 105 103 99
## [1] "Top emojis for hslu :"
## # A tibble: 20 × 4
## emoji_name unicode emoji_category n
## <chr> <chr> <chr> <int>
## 1 sparkles ✨ Activities 29
## 2 flag_Switzerland 🇨🇭 Flags 20
## 3 party_popper 🎉 Activities 12
## 4 rocket 🚀 Travel & Places 12
## 5 partying_face 🥳 Smileys & Emotion 11
## 6 bottle_with_popping_cork 🍾 Food & Drink 9
## 7 Christmas_tree 🎄 Activities 7
## 8 clap 👏 People & Body 7
## 9 star ⭐ Travel & Places 7
## 10 glowing_star 🌟 Travel & Places 6
## 11 +1 👍 People & Body 5
## 12 bulb 💡 Objects 5
## 13 clinking_glasses 🥂 Food & Drink 5
## 14 smiling_face_with_sunglasses 😎 Smileys & Emotion 5
## 15 camera_flash 📸 Objects 4
## 16 four_leaf_clover 🍀 Animals & Nature 4
## 17 musical_notes 🎶 Objects 4
## 18 person_running 🏃 People & Body 4
## 19 raised_hands 🙌 People & Body 4
## 20 robot 🤖 Smileys & Emotion 4
## [1] "Most common words for ost_fh :"
## #ostschweizerfachhochschul @ost_fh
## 72 63
## ost @ozg_ost
## 55 28
## mehr neue
## 26 22
## st.gallen rapperswil
## 17 17
## neuen ostschweiz
## 17 15
## #informatik podcast
## 15 15
## detail gibt
## 15 14
## #ost menschen
## 12 12
## thema campus
## 12 12
## @eastdigit #podcast
## 12 12
## [1] "Top emojis for ost_fh :"
## # A tibble: 20 × 4
## emoji_name unicode emoji_category n
## <chr> <chr> <chr> <int>
## 1 graduation_cap 🎓 Objects 3
## 2 man 👨 People & Body 2
## 3 man_student 👨🎓 People & Body 2
## 4 rocket 🚀 Travel & Places 2
## 5 snowflake ❄️ Travel & Places 2
## 6 backhand_index_pointing_right 👉 People & Body 1
## 7 brain 🧠 People & Body 1
## 8 chocolate_bar 🍫 Food & Drink 1
## 9 clapper 🎬 Objects 1
## 10 eyes 👀 People & Body 1
## 11 fire 🔥 Travel & Places 1
## 12 flexed_biceps 💪 People & Body 1
## 13 grinning 😀 Smileys & Emotion 1
## 14 heart_eyes_cat 😻 Smileys & Emotion 1
## 15 high_voltage ⚡ Travel & Places 1
## 16 mantelpiece_clock 🕰️ Travel & Places 1
## 17 sleeping 😴 Smileys & Emotion 1
## 18 slightly_smiling_face 🙂 Smileys & Emotion 1
## 19 sun ☀️ Travel & Places 1
## 20 woman 👩 People & Body 1
## [1] "Most common words for supsi_ch :"
## supsi #supsiev #supsinew info formazion studenti
## 224 175 167 148 132 132
## progetto @supsi_ch iscrizioni master nuovo bachelor
## 126 123 117 117 116 113
## right innov dipartimento pi oggi @usi_univers
## 113 108 103 103 102 98
## informazioni manag
## 97 94
## [1] "Top emojis for supsi_ch :"
## # A tibble: 20 × 4
## emoji_name unicode emoji_category n
## <chr> <chr> <chr> <int>
## 1 arrow_right ➡️ Symbols 84
## 2 backhand_index_pointing_right 👉 People & Body 24
## 3 arrow_forward ▶️ Symbols 18
## 4 graduation_cap 🎓 Objects 17
## 5 bulb 💡 Objects 10
## 6 flag_Switzerland 🇨🇭 Flags 9
## 7 rocket 🚀 Travel & Places 9
## 8 party_popper 🎉 Activities 8
## 9 clap 👏 People & Body 7
## 10 exclamation ❗ Symbols 5
## 11 SOON_arrow 🔜 Symbols 4
## 12 grinning_face_with_big_eyes 😃 Smileys & Emotion 4
## 13 Christmas_tree 🎄 Activities 3
## 14 camera_flash 📸 Objects 3
## 15 computer 💻 Objects 3
## 16 movie_camera 🎥 Objects 3
## 17 pushpin 📌 Objects 3
## 18 rainbow 🌈 Travel & Places 3
## 19 studio_microphone 🎙️ Objects 3
## 20 woman 👩 People & Body 3
# Generate general tokens for bigram and trigram analysis
tokens <- tweets %>%
corpus(text_field = "full_text_emojis") %>%
tokens(
remove_punct = TRUE, remove_symbols = TRUE, remove_numbers = TRUE,
remove_url = TRUE, remove_separators = TRUE
) %>%
tokens_tolower() %>%
tokens_wordstem() %>%
tokens_select(
pattern =
c(
stopwords("en"), stopwords("de"),
stopwords("fr"), stopwords("it"), extended_stopwords
), selection = "remove"
)
# Bigram Wordcloud
bi_gram_tokens <- tokens_ngrams(tokens, n = 2)
dfm_bi_gram <- dfm(bi_gram_tokens)
freqs_bi_gram <- sort(colSums(dfm_bi_gram), decreasing = TRUE)
head(freqs_bi_gram, 20)
## right_arrow htw_chur index_point
## 421 259 207
## backhand_index hochschul_luzern point_right
## 206 185 183
## berner_fachhochschul sozial_arbeit prof_dr
## 157 154 142
## haut_cole herzlich_gratul open_book
## 141 139 117
## magnifi_glass glass_tilt tilt_right
## 97 97 97
## fh_graubnden neusten_blogbeitrag book_#revuehmisphr
## 91 87 85
## social_media advanc_studi
## 84 83
# Create the bigram word cloud
set.seed(123)
wordcloud(
words = names(freqs_bi_gram),
freq = freqs_bi_gram,
max.words = 200,
random.order = FALSE,
colors = brewer.pal(8, "Dark2")
)
# Trigram Wordcloud
tri_gram_tokens <- tokens_ngrams(tokens, n = 3)
dfm_tri_gram <- dfm(tri_gram_tokens)
reqs_tri_gram <- sort(colSums(dfm_tri_gram), decreasing = TRUE)
head(reqs_tri_gram, 20)
## backhand_index_point index_point_right
## 206 183
## magnifi_glass_tilt glass_tilt_right
## 97 97
## open_book_#revuehmisphr hochschul_gestaltung_kunst
## 85 62
## dipartimento_tecnologi_innov master_advanc_studi
## 40 38
## depart_sozial_arbeit #infoanlass_mrz_findet
## 36 33
## polic_car_light univers_appli_scienc
## 32 31
## busi_administr_statt findet_#zrich_infoanlass
## 30 30
## tag_offenen_tr hochschul_life_scienc
## 29 29
## gestaltung_kunst_fhnw mas_busi_administr
## 29 28
## mehr_neuen_blogbeitrag mehr_neusten_blogbeitrag
## 28 28
# Create the bigram word cloud
set.seed(123)
wordcloud(
words = names(reqs_tri_gram),
freq = reqs_tri_gram,
max.words = 200,
random.order = FALSE,
colors = brewer.pal(8, "Dark2")
)
### LDA Topic Modeling The Latent Dirichlet Allocation (LDA) model was
applied to the entire dataset of tweets to identify common topics. Here,
the model with 5 topics was selected, and the top terms for each topic
were extracted.
# Source: Christoph Zangger -> löscht alle Reihen mit nur 0s
new_dfm <- dfm_subset(dfm_list$en, ntoken(dfm_list$en) > 0)
tweet_lda <- LDA(new_dfm, k = 5, control = list(seed = 123))
# Tidy the LDA results
topic_terms <- tidy(tweet_lda, matrix = "beta")
# Extract topics and top terms
topics <- as.data.frame(terms(tweet_lda, 50)) # First fifty words per topic
# Extract top terms per topic
top_terms <- topic_terms %>%
group_by(topic) %>%
top_n(8, beta) %>% # Show top 8 terms per topic
ungroup() %>%
arrange(topic, -beta)
# Visualize top terms per topic
top_terms %>%
mutate(term = reorder_within(term, beta, topic)) %>%
ggplot(aes(beta, term, fill = factor(topic))) +
geom_col(show.legend = FALSE) +
facet_wrap(~topic, scales = "free") +
scale_y_reordered() +
labs(
x = "Beta (Term Importance within Topic)",
y = "Terms",
title = "Top Terms per Topic in Tweets (LDA)"
) +
theme_minimal()
# Most different words among topics (using log ratios)
diff <- topic_terms %>%
mutate(topic = paste0("topic", topic)) %>%
spread(topic, beta) %>%
filter(topic1 > .001 | topic2 > .001 | topic3 > .001) %>%
mutate(
logratio_t1t2 = log2(topic2 / topic1),
logratio_t1t3 = log2(topic3 / topic1),
logratio_t2t3 = log2(topic3 / topic2)
)
diff
## # A tibble: 313 × 9
## term topic1 topic2 topic3 topic4 topic5 logratio_t1t2 logratio_t1t3
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 @academi… 2.26e-3 6.89e-4 4.17e-3 8.28e-5 2.11e-3 -1.71 0.881
## 2 @bfh_hesb 1.03e-3 5.87e-3 1.19e-3 4.82e-3 2.62e-3 2.52 0.218
## 3 @empa_ch 1.07e-3 4.16e-6 1.58e-4 5.76e-5 4.35e-4 -8.01 -2.76
## 4 @enginee… 6.40e-5 1.38e-4 1.73e-3 1.19e-4 7.10e-4 1.11 4.75
## 5 @esnchur 5.17e-4 4.43e-4 1.03e-3 4.06e-4 1.90e-5 -0.222 0.992
## 6 @fh_grau… 6.45e-4 1.45e-3 6.90e-4 9.07e-4 7.90e-4 1.17 0.0983
## 7 @fhnw 3.90e-4 3.08e-3 5.15e-3 2.83e-3 4.77e-3 2.98 3.72
## 8 @fhnwbusi 3.75e-3 5.59e-3 1.95e-3 4.77e-4 1.00e-3 0.577 -0.945
## 9 @greater… 5.47e-4 1.24e-3 1.09e-3 1.22e-3 7.35e-4 1.18 0.993
## 10 @grstift… 2.04e-4 3.60e-3 1.59e-3 1.25e-3 6.04e-4 4.14 2.96
## # ℹ 303 more rows
## # ℹ 1 more variable: logratio_t2t3 <dbl>
# LDA Topic Modeling for each university
universities <- unique(tweets$university)
for (uni in universities) {
# Filter tweets for the current university
uni_tweets <- tweets %>% filter(university == uni & lang %in% langs)
tokens_uni <- uni_tweets %>%
corpus(text_field = "full_text_emojis") %>%
tokens(
remove_punct = TRUE, remove_symbols = TRUE, remove_numbers = TRUE,
remove_url = TRUE, remove_separators = TRUE
) %>%
tokens_tolower() %>%
tokens_wordstem() %>%
tokens_ngrams(n = 1) %>%
tokens_select(
pattern =
c(
stopwords("en"), stopwords("de"),
stopwords("fr"), stopwords("it"), extended_stopwords
), selection = "remove"
)
uni_dfm <- dfm(tokens_uni)
# Apply LDA
uni_dfm <- dfm_subset(uni_dfm, ntoken(uni_dfm) > 0)
tweet_lda <- LDA(uni_dfm, k = 5, control = list(seed = 123))
# Tidy the LDA results
tweet_lda_td <- tidy(tweet_lda)
# Extract top terms per topic
top_terms <- tweet_lda_td %>%
group_by(topic) %>%
top_n(8, beta) %>%
ungroup() %>%
arrange(topic, -beta)
# Visualize top terms per topic
p <- top_terms %>%
mutate(term = reorder_within(term, beta, topic)) %>%
ggplot(aes(beta, term, fill = factor(topic))) +
geom_col(show.legend = FALSE) +
facet_wrap(~topic, scales = "free") +
scale_y_reordered() +
labs(
x = "Beta (Term Importance within Topic)",
y = NULL,
title = paste("Top Terms per Topic in Tweets from", uni, "(LDA)")
)
print(p)
# Topic Model Summary: top 10 terms per topic
cat("\nTopic Model Summary for", uni, ":\n")
print(as.data.frame(terms(tweet_lda, 10)))
}
##
## Topic Model Summary for FHNW :
## Topic 1 Topic 2 Topic 3 Topic 4 Topic 5
## 1 @hsafhnw fhnw @fhnwbusi @fhnwbusi @fhnwbusi
## 2 neue hochschul @hsafhnw fhnw mehr
## 3 heut index @fhnw campus @fhnw
## 4 fhnw backhand hochschul entwickelt @fhnwtechnik
## 5 @fhnwtechnik @fhnwpsychologi @fhnwtechnik brugg-windisch neuen
## 6 olten point heut mehr neue
## 7 mehr projekt mehr @fhnwpsychologi ab
## 8 hochschul heut studierend @fhnwtechnik studierend
## 9 swiss right fhnw basel geht
## 10 basel neuen gibt schweizer erklrt
##
## Topic Model Summary for FH_Graubuenden :
## Topic 1 Topic 2 Topic 3 Topic 4 Topic 5
## 1 chur @htwchurtour chur statt mehr
## 2 htw student studi findet blogbeitrag
## 3 #chur #htwchur via #infoanlass neuen
## 4 #htwchur @suedostschweiz htw infoanlass #fhgr
## 5 #studium tourism #htwchur #htwchur graubnden
## 6 #fhgr @fh_graubuenden #smartcultur chur neusten
## 7 graubnden studierend heut manag fh
## 8 #schweiz dank @suedostschweiz mas prof
## 9 #graubnden @infowisschur media busi dr
## 10 studium institut @clickandtri mrz wurd
##
## Topic Model Summary for ZHAW :
## Topic 1 Topic 2 Topic 3 Topic 4
## 1 zhaw knnen zhaw @zhaw
## 2 @engineeringzhaw @engineeringzhaw @engineeringzhaw cc
## 3 @zhaw #zhaw neue #zhawimpact
## 4 winterthur zeigt geht zhaw
## 5 knnen studi mehr @iam_winterthur
## 6 @sml_zhaw neue cc heut
## 7 via dank zeigt #zhaw
## 8 zeigt schweizer #zhaw via
## 9 heut heut gibt zukunft
## 10 ab neuen immer neuen
## Topic 5
## 1 @zhaw
## 2 #zhaw
## 3 @iam_winterthur
## 4 studi
## 5 zhaw
## 6 dank
## 7 mehr
## 8 @sml_zhaw
## 9 #toniar
## 10 #tonitag
##
## Topic Model Summary for bfh :
## Topic 1 Topic 2 Topic 3 Topic 4 Topic 5
## 1 bfh bern bfh bfh neue
## 2 fachhochschul @bfh_hesb @bfh_hesb thema thema
## 3 berner berner schweiz mehr #knoten_maschen
## 4 bern mehr mehr knnen berner
## 5 mehr bfh neuen @hkb_bfh bern
## 6 @hkb_bfh unternehmen neue bern schweizer
## 7 entwickelt neue innen statt welch
## 8 anmelden projekt bern biel gibt
## 9 arbeit ab gesundheit fachhochschul @bfh_sosec
## 10 sozial thema schweizer erfahren day
##
## Topic Model Summary for hes_so :
## Topic 1 Topic 2 Topic 3 Topic 4 Topic 5
## 1 projet right hes-so right hes-so
## 2 arrow projet tudiant arrow haut
## 3 dan cole dan open right
## 4 right tilt @hes_so book tudiant
## 5 #hes_so dan arrow master arrow
## 6 @hessovalai glass projet #revuehmisphr projet
## 7 dcouvrez arrow recherch suiss dan
## 8 @hes_so journ nouveau recherch @hessovalai
## 9 #revuehmisphr format suiss cole @radiotelesuiss
## 10 particip @hes_so haut scienc #hes_so
##
## Topic Model Summary for hslu :
## Topic 1 Topic 2 Topic 3 Topic 4 Topic 5
## 1 @hslu luzern @hslu zeigt #hsluinformatik
## 2 luzern @hslu luzern heut mehr
## 3 mehr depart hochschul digit schweizer
## 4 #hsluinformatik hochschul knnen mehr @hslu
## 5 neue heut mehr hochschul projekt
## 6 depart neue campus welch #hsluwirtschaft
## 7 thema mehr jahr projekt schweiz
## 8 design geht #hsludk neue luzern
## 9 schweizer zeigt findet schweiz studierend
## 10 statt knnen statt studi studi
##
## Topic Model Summary for ost_fh :
## Topic 1 Topic 2
## 1 #ostschweizerfachhochschul #ostschweizerfachhochschul
## 2 @ost_fh @ost_fh
## 3 bachelor ost
## 4 neue @ozg_ost
## 5 #diplomfei kontrast
## 6 podcast kulturzyklus
## 7 ab #ausdreiwirdein
## 8 #bachelor fhs
## 9 online-infoabend #ost
## 10 statt mehr
## Topic 3 Topic 4
## 1 @ost_fh ost
## 2 ost @ost_fh
## 3 #ostschweizerfachhochschul @ozg_ost
## 4 rapperswil-jona #ostschweizerfachhochschul
## 5 campus rapperswil
## 6 detail neue
## 7 neu neuen
## 8 #wirtschaftsinformatik ostschweiz
## 9 gibt st.gallen
## 10 #informatik mehr
## Topic 5
## 1 #ostschweizerfachhochschul
## 2 menschen
## 3 mehr
## 4 @ost_fh
## 5 ost
## 6 spricht
## 7 prof
## 8 institut
## 9 team
## 10 @ozg_ost
##
## Topic Model Summary for supsi_ch :
## Topic 1 Topic 2 Topic 3 Topic 4 Topic 5
## 1 progetto formazion iscrizioni #supsinew supsi
## 2 #supsiev studenti master supsi info
## 3 @usi_univers oggi supsi #supsiev bachelor
## 4 #supsinew informazioni innov studenti corsi
## 5 competenz supsi apert right pi
## 6 arrow ottobr info formazion #supsiev
## 7 svizzera @supsi_ch bachelor @supsi_ch nuovo
## 8 @supsi_ch dipartimento design info progetto
## 9 informazioni maggiori advanc master studi
## 10 supsi iscrizioni @usi_univers ricerca campus
The LDA analysis reveals distinct topics across all tweets, emphasizing academic activities, events, and institutional developments. Each university exhibits unique themes, reflecting their individual focus areas and regional characteristics. This detailed topic modeling helps understand the primary subjects of interest and communication patterns across different Swiss universities of applied sciences.
The distribution of tweet lengths shows variation across universities. Most tweets are concise, aligning with Twitter’s character limit, but the exact length distribution differs among institutions. It is interesting to see that much tweets have around 150 words and that the tweets from the universities are not that long. It is a typical sign that the tweets are not that long and this is a common thing in social media.
tweets %>%
mutate(tweet_length = nchar(full_text)) %>%
ggplot(aes(x = tweet_length)) +
geom_histogram() +
labs(title = "Distribution of Tweet Lengths")
### Sentiment Analysis Sentiment analysis was conducted to evaluate the
emotional tone of the tweets. The analysis used the Syuzhet method to
calculate sentiment scores for each tweet.
Overall Sentiment Trends: - The sentiment scores vary over time and by university, showing fluctuations in the emotional tone of the tweets. - Positive words commonly found in tweets include terms related to academic achievements, collaborations, and positive experiences. - Negative words often relate to challenges, competitions, and issues faced by the universities.
Sentiment by University: - FHNW: Positive words include “academy”, “accelerate”, and “activities”. Negative words include “avoid”, “bacteria”, and “challenge”. - FH Graubünden: Positive words include “able”, “academic”, and “advantage”. Negative words include “competition”, “corruption”, and “fire”. - ZHAW: Positive words include “abilities”, “academic”, and “achievement”. Negative words include “barrier”, “challenge”, and “competition”. - BFH: Positive words include “academic”, “access”, and “activities”. Negative words include “aggression”, “competition”, and “fail”. - HES-SO: Positive words include “academic”, “active”, and “amazing”. Negative words include “confessions”, “failure”, and “hard”. - HSLU: Positive words include “academic”, “access”, and “achievement”. Negative words include “addiction”, “challenge”, and “fail”. - OST-FH: Positive words include “announce”, “beautiful”, and “collaboration”. Negative words are minimal, including “dire” and “fire”. - SUPSI-CH: Positive words include “academic”, “access”, and “achievement”. Negative words include “barrier”, “cloud”, and “danger”.
# Calculate Sentiment for Supported Languages Only
langs <- c("de", "fr", "it", "en")
tweets_filtered <- tweets %>%
filter(lang %in% langs)
# Function to get sentiment based on language
get_multilang_sentiment <- function(text, lang) {
if (lang == "de") {
return(get_sentiment(text, method = "nrc", language = "german"))
} else if (lang == "it") {
return(get_sentiment(text, method = "nrc", language = "italian"))
} else if (lang == "fr") {
return(get_sentiment(text, method = "nrc", language = "french"))
} else if (lang == "en") {
return(get_sentiment(text, method = "syuzhet"))
} else {
return(NA) # Return NA for unsupported languages
}
}
# Calculate Syuzhet Sentiment for each Tweet
tweets_filtered$sentiment <-
mapply(
get_multilang_sentiment,
tweets_filtered$full_text, tweets_filtered$lang
)
plot_data <- tweets_filtered %>%
group_by(university, month) %>%
summarize(mean_sentiment = mean(sentiment, na.rm = TRUE))
# Convert month to Date format
plot_data <- plot_data %>%
mutate(month = as.Date(month, format = "%Y-%m-%d"))
# Plot Sentiment by all Universities
ggplot(plot_data, aes(
x = month,
y = mean_sentiment,
color = university, group = university
)) +
geom_line() +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(
title = "Mean Sentiment Over Time by University",
y = "Mean Sentiment Score",
x = "Month"
) +
scale_x_date(date_breaks = "1 month", date_labels = "%Y-%m") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
theme_minimal()
for (uni in unique(tweets_filtered$university)) {
most_used_lang <- tweets %>%
filter(university == uni) %>%
count(lang) %>%
slice_max(n = 1, order_by = n) %>%
pull(lang)
uni_tweets <- tweets_filtered %>%
filter(university == uni & lang == most_used_lang)
plot_data <- uni_tweets %>%
group_by(month) %>%
summarize(mean_sentiment = mean(sentiment, na.rm = TRUE))
# Plot Syuzhet Sentiment Over Time (Per University)
print(ggplot(plot_data, aes(x = month, y = mean_sentiment, group = 1)) +
geom_line() +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(
title = paste0("Mean Syuzhet Sentiment Over Time by - ", uni),
y = "Mean Sentiment Score",
x = "Month"
))
# Tokenize and Preprocess Words
uni_words <- uni_tweets %>%
unnest_tokens(word, full_text_emojis) %>%
filter(nchar(word) > 3) %>%
filter(!str_detect(word, "\\d")) %>%
filter(!str_detect(word, "https?://\\S+|www\\.\\S+|t\\.co|http|https"))
# Remove stopwords after counting word frequency
word_counts <- uni_words %>%
count(word, sort = TRUE) %>%
anti_join(get_stopwords(language = most_used_lang), by = "word")
sentiment_words <- word_counts %>%
mutate(sentiment = get_multilang_sentiment(word, most_used_lang))
# Separate Positive and Negative Words
positive_words <- sentiment_words %>%
filter(sentiment >= 0) %>%
arrange(desc(n)) %>%
rename(freq = n)
negative_words <- sentiment_words %>%
filter(sentiment < 0) %>%
arrange(desc(n)) %>%
rename(freq = n)
# Create and Display Word Clouds
print(paste0("Positive words for: ", uni, " in ", most_used_lang))
print(head(positive_words, 20))
wordcloud(
words = positive_words$word,
freq = positive_words$freq,
max.words = 200,
random.order = FALSE,
colors = brewer.pal(8, "Dark2")
)
print(paste0("Negative words for: ", uni, " in ", most_used_lang))
print(head(negative_words, 20))
wordcloud(
words = negative_words$word,
freq = negative_words$freq,
max.words = 200,
random.order = FALSE,
colors = brewer.pal(8, "Dark2")
)
}
## [1] "Positive words for: FHNW in de"
## # A tibble: 20 × 4
## # Groups: university [1]
## university word freq sentiment
## <chr> <chr> <int> <dbl>
## 1 FHNW fhnw 1598 0
## 2 FHNW fhnwbusiness 342 0
## 3 FHNW hsafhnw 253 0
## 4 FHNW mehr 232 0
## 5 FHNW hochschule 219 0
## 6 FHNW fhnwtechnik 204 0
## 7 FHNW campus 170 0
## 8 FHNW heute 147 0
## 9 FHNW fhnwpsychologie 138 0
## 10 FHNW basel 134 0
## 11 FHNW neue 134 0
## 12 FHNW studierende 123 0
## 13 FHNW brugg 111 0
## 14 FHNW neuen 108 0
## 15 FHNW schweiz 107 0
## 16 FHNW windisch 101 0
## 17 FHNW olten 98 0
## 18 FHNW knnen 97 0
## 19 FHNW projekt 86 0
## 20 FHNW prof 83 0
## [1] "Negative words for: FHNW in de"
## # A tibble: 20 × 4
## # Groups: university [1]
## university word freq sentiment
## <chr> <chr> <int> <dbl>
## 1 FHNW wettbewerb 13 -1
## 2 FHNW problem 12 -1
## 3 FHNW trotz 12 -2
## 4 FHNW bakterien 10 -1
## 5 FHNW herausforderung 10 -1
## 6 FHNW junge 10 -1
## 7 FHNW krise 10 -1
## 8 FHNW sucht 10 -1
## 9 FHNW spiel 8 -1
## 10 FHNW spielen 6 -1
## 11 FHNW kaum 5 -1
## 12 FHNW lust 5 -1
## 13 FHNW nachfrage 5 -1
## 14 FHNW reihe 5 -1
## 15 FHNW schwer 5 -3
## 16 FHNW tragen 5 -1
## 17 FHNW warten 5 -1
## 18 FHNW angriff 4 -2
## 19 FHNW bisher 4 -1
## 20 FHNW eingeweiht 4 -1
## [1] "Positive words for: FH_Graubuenden in de"
## # A tibble: 20 × 4
## # Groups: university [1]
## university word freq sentiment
## <chr> <chr> <int> <dbl>
## 1 FH_Graubuenden htw_chur 553 0
## 2 FH_Graubuenden chur 469 0
## 3 FH_Graubuenden htwchur 345 0
## 4 FH_Graubuenden infoanlass 327 0
## 5 FH_Graubuenden statt 299 0
## 6 FH_Graubuenden findet 271 0
## 7 FH_Graubuenden graubnden 258 0
## 8 FH_Graubuenden studium 255 0
## 9 FH_Graubuenden mehr 230 0
## 10 FH_Graubuenden blogbeitrag 213 0
## 11 FH_Graubuenden tourismus 177 0
## 12 FH_Graubuenden fhgr 172 0
## 13 FH_Graubuenden bachelor 162 0
## 14 FH_Graubuenden management 136 0
## 15 FH_Graubuenden neuen 128 0
## 16 FH_Graubuenden heute 125 0
## 17 FH_Graubuenden schweiz 123 0
## 18 FH_Graubuenden multimedia 107 0
## 19 FH_Graubuenden photonics 104 0
## 20 FH_Graubuenden suedostschweiz 104 0
## [1] "Negative words for: FH_Graubuenden in de"
## # A tibble: 20 × 4
## # Groups: university [1]
## university word freq sentiment
## <chr> <chr> <int> <dbl>
## 1 FH_Graubuenden sucht 15 -1
## 2 FH_Graubuenden wettbewerb 13 -1
## 3 FH_Graubuenden korruption 12 -1
## 4 FH_Graubuenden junge 10 -1
## 5 FH_Graubuenden hoch 8 -1
## 6 FH_Graubuenden herausforderung 7 -1
## 7 FH_Graubuenden rutsch 7 -1
## 8 FH_Graubuenden kick 6 -1
## 9 FH_Graubuenden vergessen 6 -3
## 10 FH_Graubuenden kochen 5 -1
## 11 FH_Graubuenden problem 5 -1
## 12 FH_Graubuenden lust 4 -1
## 13 FH_Graubuenden spielen 4 -1
## 14 FH_Graubuenden fall 3 -1
## 15 FH_Graubuenden falls 3 -1
## 16 FH_Graubuenden gefahr 3 -2
## 17 FH_Graubuenden kampf 3 -5
## 18 FH_Graubuenden krebs 3 -2
## 19 FH_Graubuenden krise 3 -1
## 20 FH_Graubuenden nachfrage 3 -1
## [1] "Positive words for: ZHAW in de"
## # A tibble: 20 × 4
## # Groups: university [1]
## university word freq sentiment
## <chr> <chr> <int> <dbl>
## 1 ZHAW zhaw 901 0
## 2 ZHAW engineeringzhaw 236 0
## 3 ZHAW iam_winterthur 141 0
## 4 ZHAW studie 139 1
## 5 ZHAW mehr 138 0
## 6 ZHAW schweiz 138 0
## 7 ZHAW knnen 137 0
## 8 ZHAW heute 134 0
## 9 ZHAW zeigt 131 0
## 10 ZHAW schweizer 126 0
## 11 ZHAW unsere 126 0
## 12 ZHAW sml_zhaw 123 0
## 13 ZHAW zhawimpact 120 0
## 14 ZHAW neue 117 0
## 15 ZHAW winterthur 117 0
## 16 ZHAW danke 102 0
## 17 ZHAW studierende 84 0
## 18 ZHAW gibt 83 0
## 19 ZHAW neuen 82 0
## 20 ZHAW dass 81 0
## [1] "Negative words for: ZHAW in de"
## # A tibble: 20 × 4
## # Groups: university [1]
## university word freq sentiment
## <chr> <chr> <int> <dbl>
## 1 ZHAW trotz 19 -2
## 2 ZHAW fall 15 -1
## 3 ZHAW junge 14 -1
## 4 ZHAW sucht 11 -1
## 5 ZHAW falls 10 -1
## 6 ZHAW krise 9 -1
## 7 ZHAW problem 9 -1
## 8 ZHAW spielen 8 -1
## 9 ZHAW behinderung 7 -5
## 10 ZHAW herausforderung 7 -1
## 11 ZHAW bisher 6 -1
## 12 ZHAW bund 6 -1
## 13 ZHAW druck 6 -1
## 14 ZHAW fehlt 5 -2
## 15 ZHAW hoch 5 -1
## 16 ZHAW laut 5 -1
## 17 ZHAW rauch 5 -1
## 18 ZHAW schwer 5 -3
## 19 ZHAW stress 5 -1
## 20 ZHAW tragen 5 -1
## [1] "Positive words for: bfh in de"
## # A tibble: 20 × 4
## # Groups: university [1]
## university word freq sentiment
## <chr> <chr> <int> <dbl>
## 1 bfh bern 290 0
## 2 bfh berner 226 0
## 3 bfh mehr 213 0
## 4 bfh thema 196 0
## 5 bfh neue 184 0
## 6 bfh bfh_hesb 169 0
## 7 bfh fachhochschule 164 0
## 8 bfh biel 112 0
## 9 bfh hkb_bfh 112 0
## 10 bfh knnen 109 0
## 11 bfh holz 106 0
## 12 bfh knoten_maschen 106 0
## 13 bfh hafl 104 0
## 14 bfh heute 101 0
## 15 bfh anmelden 100 0
## 16 bfh innen 100 0
## 17 bfh schweiz 100 0
## 18 bfh schweizer 100 0
## 19 bfh neuen 96 0
## 20 bfh online 93 0
## [1] "Negative words for: bfh in de"
## # A tibble: 20 × 4
## # Groups: university [1]
## university word freq sentiment
## <chr> <chr> <int> <dbl>
## 1 bfh krise 22 -1
## 2 bfh wettbewerb 19 -1
## 3 bfh armut 15 -1
## 4 bfh junge 11 -1
## 5 bfh herausforderung 9 -1
## 6 bfh batterie 8 -1
## 7 bfh boden 8 -2
## 8 bfh schwer 8 -3
## 9 bfh stress 8 -1
## 10 bfh sucht 8 -1
## 11 bfh bisher 7 -1
## 12 bfh kosten 7 -1
## 13 bfh liegen 7 -1
## 14 bfh prozess 7 -1
## 15 bfh sterben 7 -1
## 16 bfh trotz 7 -2
## 17 bfh fehlt 6 -2
## 18 bfh hoch 6 -1
## 19 bfh kaum 6 -1
## 20 bfh spielen 6 -1
## [1] "Positive words for: hes_so in fr"
## # A tibble: 20 × 4
## # Groups: university [1]
## university word freq sentiment
## <chr> <chr> <int> <dbl>
## 1 hes_so right 433 0
## 2 hes_so arrow 324 0
## 3 hes_so hes_so 267 0
## 4 hes_so recherche 184 0
## 5 hes_so projet 148 0
## 6 hes_so master 144 0
## 7 hes_so hessovalais 133 0
## 8 hes_so suisse 126 0
## 9 hes_so open 117 0
## 10 hes_so book 116 0
## 11 hes_so haute 108 0
## 12 hes_so projets 104 0
## 13 hes_so sant 104 0
## 14 hes_so tudiantes 104 0
## 15 hes_so plus 103 0
## 16 hes_so revuehmisphres 97 0
## 17 hes_so glass 96 0
## 18 hes_so magnifying 96 0
## 19 hes_so tilted 96 0
## 20 hes_so genve 86 0
## [1] "Negative words for: hes_so in fr"
## # A tibble: 20 × 4
## # Groups: university [1]
## university word freq sentiment
## <chr> <chr> <int> <dbl>
## 1 hes_so programme 58 -1
## 2 hes_so lance 47 -1
## 3 hes_so entre 46 -1
## 4 hes_so appel 45 -1
## 5 hes_so salon 41 -1
## 6 hes_so contre 19 -1
## 7 hes_so crise 19 -1
## 8 hes_so tous 19 -1
## 9 hes_so livre 13 -1
## 10 hes_so rencontre 12 -1
## 11 hes_so vice 11 -1
## 12 hes_so demande 9 -1
## 13 hes_so vide 8 -1
## 14 hes_so destin 7 -1
## 15 hes_so plein 7 -1
## 16 hes_so sujet 7 -1
## 17 hes_so disposition 6 -1
## 18 hes_so douleur 6 -3
## 19 hes_so faon 6 -1
## 20 hes_so campagne 5 -1
## [1] "Positive words for: hslu in de"
## # A tibble: 20 × 4
## # Groups: university [1]
## university word freq sentiment
## <chr> <chr> <int> <dbl>
## 1 hslu hslu 1645 0
## 2 hslu luzern 288 0
## 3 hslu mehr 257 0
## 4 hslu unsere 201 0
## 5 hslu hochschule 197 0
## 6 hslu informatik 183 0
## 7 hslu hsluinformatik 173 0
## 8 hslu heute 171 0
## 9 hslu schweizer 145 0
## 10 hslu studie 141 1
## 11 hslu design 137 0
## 12 hslu neue 136 0
## 13 hslu zeigt 132 0
## 14 hslu knnen 124 0
## 15 hslu arbeit 121 2
## 16 hslu schweiz 121 0
## 17 hslu bachelor 119 0
## 18 hslu departement 116 0
## 19 hslu studierende 113 0
## 20 hslu kunst 108 2
## [1] "Negative words for: hslu in de"
## # A tibble: 20 × 4
## # Groups: university [1]
## university word freq sentiment
## <chr> <chr> <int> <dbl>
## 1 hslu junge 19 -1
## 2 hslu quiz 19 -1
## 3 hslu trotz 17 -2
## 4 hslu wettbewerb 17 -1
## 5 hslu krise 16 -1
## 6 hslu spiel 13 -1
## 7 hslu spielen 12 -1
## 8 hslu kaum 11 -1
## 9 hslu tragen 9 -1
## 10 hslu gewalt 8 -1
## 11 hslu problem 8 -1
## 12 hslu sucht 8 -1
## 13 hslu herausforderung 7 -1
## 14 hslu hype 7 -1
## 15 hslu kraft 7 -1
## 16 hslu bisher 6 -1
## 17 hslu bund 6 -1
## 18 hslu druck 6 -1
## 19 hslu hoch 6 -1
## 20 hslu kosten 6 -1
## [1] "Positive words for: ost_fh in de"
## # A tibble: 20 × 4
## # Groups: university [1]
## university word freq sentiment
## <chr> <chr> <int> <dbl>
## 1 ost_fh ostschweizerfachhochschule 72 0
## 2 ost_fh ost_fh 57 0
## 3 ost_fh podcast 32 0
## 4 ost_fh ozg_ost 28 0
## 5 ost_fh mehr 26 0
## 6 ost_fh kulturzyklus 24 0
## 7 ost_fh bachelor 22 0
## 8 ost_fh rapperswil 22 0
## 9 ost_fh informatik 21 0
## 10 ost_fh online 21 0
## 11 ost_fh neue 18 0
## 12 ost_fh neuen 17 0
## 13 ost_fh st.gallen 17 0
## 14 ost_fh ostschweizer 16 0
## 15 ost_fh details 15 0
## 16 ost_fh kontrast 15 0
## 17 ost_fh fachhochschule 14 0
## 18 ost_fh campus 13 0
## 19 ost_fh unsere 13 0
## 20 ost_fh eastdigital 12 0
## [1] "Negative words for: ost_fh in de"
## # A tibble: 20 × 4
## # Groups: university [1]
## university word freq sentiment
## <chr> <chr> <int> <dbl>
## 1 ost_fh reihe 4 -1
## 2 ost_fh abfall 2 -2
## 3 ost_fh behinderung 2 -5
## 4 ost_fh fall 2 -1
## 5 ost_fh junge 2 -1
## 6 ost_fh knapp 2 -1
## 7 ost_fh schlagen 2 -2
## 8 ost_fh trotz 2 -2
## 9 ost_fh aufsehen 1 -1
## 10 ost_fh ausrichtung 1 -1
## 11 ost_fh blind 1 -2
## 12 ost_fh blindheit 1 -1
## 13 ost_fh dringend 1 -1
## 14 ost_fh entscheiden 1 -1
## 15 ost_fh epilepsie 1 -1
## 16 ost_fh fesseln 1 -1
## 17 ost_fh gegenstand 1 -1
## 18 ost_fh gegner 1 -2
## 19 ost_fh gewalt 1 -1
## 20 ost_fh hype 1 -1
## [1] "Positive words for: supsi_ch in it"
## # A tibble: 20 × 4
## # Groups: university [1]
## university word freq sentiment
## <chr> <chr> <int> <dbl>
## 1 supsi_ch supsi 959 0
## 2 supsi_ch supsinews 152 0
## 3 supsi_ch bachelor 147 0
## 4 supsi_ch formazione 146 0
## 5 supsi_ch info 138 0
## 6 supsi_ch studenti 136 0
## 7 supsi_ch supsievent 136 0
## 8 supsi_ch master 129 0
## 9 supsi_ch lugano 125 0
## 10 supsi_ch progetto 125 0
## 11 supsi_ch nuovo 115 0
## 12 supsi_ch ticino 115 0
## 13 supsi_ch iscrizioni 114 0
## 14 supsi_ch dipartimento 102 0
## 15 supsi_ch oggi 102 0
## 16 supsi_ch ricerca 100 1
## 17 supsi_ch informazioni 97 1
## 18 supsi_ch right 95 0
## 19 supsi_ch scopri 90 0
## 20 supsi_ch svizzera 90 0
## [1] "Negative words for: supsi_ch in it"
## # A tibble: 20 × 4
## # Groups: university [1]
## university word freq sentiment
## <chr> <chr> <int> <dbl>
## 1 supsi_ch incontro 17 -1
## 2 supsi_ch partire 13 -1
## 3 supsi_ch partenza 12 -1
## 4 supsi_ch emergenza 11 -1
## 5 supsi_ch crisi 10 -1
## 6 supsi_ch sfida 10 -2
## 7 supsi_ch campagna 8 -1
## 8 supsi_ch discutere 6 -1
## 9 supsi_ch gruppo 6 -1
## 10 supsi_ch rischio 6 -1
## 11 supsi_ch disciplina 5 -1
## 12 supsi_ch giovanni 5 -1
## 13 supsi_ch intervento 5 -1
## 14 supsi_ch conseguenze 4 -1
## 15 supsi_ch disagio 4 -3
## 16 supsi_ch fondo 4 -1
## 17 supsi_ch gioco 4 -1
## 18 supsi_ch parole 4 -1
## 19 supsi_ch perdere 4 -2
## 20 supsi_ch periodo 4 -1
### Conclusion: The analysis indicates that Swiss Universities of
Applied Sciences exhibit diverse tweeting patterns in terms of content,
style, and emotions. Tweets often focus on academic achievements,
projects, and institutional news, with varying emotional tones across
different universities. Recognizing these patterns can help universities
optimize their social media strategies to better engage with their
audiences.
The comprehensive analysis of BFH’s tweets reveals several insights that can be leveraged to enhance the communication strategy.
BFH predominantly tweets in German, with 3008 tweets in this language. This aligns with the linguistic preferences of their primary audience.
The analysis of emoji usage shows that certain emojis are frequently used, which can be leveraged to increase engagement. Popular emojis like 🎓 (graduation cap) and 🚀 (rocket) often signify academic achievements and dynamic growth, resonating well with the audience.
# Language Analysis
tweets %>%
filter(university == "bfh") %>%
count(lang) %>%
arrange(desc(n))
## # A tibble: 9 × 3
## # Groups: university [1]
## university lang n
## <chr> <chr> <int>
## 1 bfh de 3008
## 2 bfh en 135
## 3 bfh fr 35
## 4 bfh qam 8
## 5 bfh da 2
## 6 bfh es 2
## 7 bfh lt 2
## 8 bfh it 1
## 9 bfh zxx 1
# Emoji Analysis
emoji_count <- tweets %>%
filter(university == "bfh") %>%
top_n_emojis(full_text)
emoji_count %>%
mutate(emoji_name = reorder(emoji_name, n)) %>%
ggplot(aes(n, emoji_name)) +
geom_col(fill = "#37556E") +
labs(
x = "Count",
y = "Emoji",
title = "Top 20 Emojis Used by BFH"
) +
theme_minimal()
heatmap_data_bfh <- tweets %>%
filter(university == "bfh") %>%
count(day, timeofday_hour) %>%
complete(day, timeofday_hour, fill = list(n = 0))
# Show the pattern of the post time by day and hour. You can see clearly the working hours which are the time where the most tweets are posted.
ggplot(heatmap_data_bfh, aes(x = timeofday_hour, y = day, fill = n)) +
geom_tile(color = "white") +
scale_fill_gradient(low = "lightblue", high = "darkblue") +
labs(title = "Heatmap of Tweet Activity", x = "Hour", y = "Day") +
theme_minimal()
engagement_hour_bfh <- tweets %>%
filter(university == "bfh") %>%
group_by(timeofday_hour) %>%
summarise(avg_engagement = mean(weighted_engagement, na.rm = TRUE))
# When we look at the engagement by hour, we can see that the most engagement is around 4pm until 8pm. There is also a slight peak in the midday
ggplot(engagement_hour, aes(x = timeofday_hour, y = avg_engagement)) +
geom_bar(stat = "identity", fill = "#37556E") +
labs(title = "Average Engagement by Hour", x = "Hour", y = "Avg Engagement") +
theme_minimal()
plot_data_bfh <- tweets_filtered %>%
filter(university == "bfh") %>%
group_by(month) %>%
summarize(mean_sentiment = mean(sentiment, na.rm = TRUE))
ggplot(plot_data_bfh, aes(x = month, y = mean_sentiment)) +
geom_line() +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(
title = "Mean Syuzhet Sentiment Over Time by BFH",
y = "Mean Sentiment Score",
x = "Month"
) +
theme_minimal()
insights <- list(
"Most Active Hours" = hours_with_most_tweets_by_uni,
"Most Active Days" = days_with_most_tweets_by_uni,
"Content Analysis" = head(words_freqs_de),
"Sentiment Analysis" = head(tweets_filtered$sentiment)
)
Based on the analysis, the following recommendations and key insights can be made to enhance BFH’s communication strategy: ### Key Insights:
Schedule tweets during the most active hours (8 AM, 4 PM, and 8 PM) to maximize engagement. Consider aligning important announcements and updates with these times for better visibility.
Leverage Tuesdays for critical updates and major announcements to take advantage of the higher activity levels. Additionally, posting on Fridays and Sundays can also result in higher engagement.
Implement a dashboard to track tweet performance, including engagement metrics, sentiment scores, and trending topics. This allows for real-time adjustments to the communication strategy.
Generate weekly or monthly reports summarizing key metrics and insights. This helps the team stay informed about content performance and areas for improvement.
Develop a content calendar that aligns tweet releases with peak engagement times and days. Incorporate findings from sentiment and topic analyses to plan relevant and engaging content.
Establish a feedback loop where the communication team reviews analytics data and adjusts the strategy accordingly. Regular team meetings to discuss insights can foster a data-driven approach to communication.
Develop a tool that predicts the best times to tweet based on historical data, optimizing tweet scheduling for maximum engagement.
Implement an automated system to analyze the sentiment of drafts before posting, ensuring an appropriate and positive tone.
Create a feature that identifies emerging topics and trends in real-time, allowing the communication team to quickly adapt and incorporate relevant themes into their messaging.
By integrating these recommendations and tools, BFH can enhance its communication strategy, ensuring that its messages are timely, relevant, and engaging for its audience.